Information Retrieval System in Bahasa Indonesia Using Latent Semantic Indexing and Semi-Discrete Matrix Decomposition
نویسندگان
چکیده
The focus of this paper is exploring the use of Latent Semantic Indexing (LSI) and Semi-Discrete Matrix Decomposition (SDD) in Bahasa Indonesia Information Retrieval System. The method is to take advantage of implicit higher-order structure in association of terms with document (" semantic structure ") in order to improve the detection of relevant document on the basis of terms found in queries in Indonesian Language. LSI is a promising enhancement to the Vector Scale Model of Information Retrieval and uses statistically derived relationship between documents instead of individual words for retrieval. The particular technique used is Semi-Discrete Matrix Decomposition (SDD) based on Kolda Research [5] – in which requires significantly less storage and is faster at query processing than Singular Value Decomposition (SVD). Using Kolda and O-Leary's SDDPACK software [7], an Implementation of SDD LSI is built in Visual Basic 6.0, Matlab 6.5 and Visual C++ and tested in the collection of student research documents at Computer Science Department of IPB. The results will be compared between the SDD performance using stemming terms and non-stemming terms.
منابع مشابه
A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval
The vast amount of textual information available today is useless unless it can be e ectively and e ciently searched. In information retrieval, we wish to match queries with relevant documents. Documents can be represented by the terms that appear within them, but literal matching of terms does not necessarily retrieve all relevant documents. Latent Semantic Indexing represents documents by app...
متن کاملLatent Semantic Indexing via a Semi-Discrete Matrix Decomposition
With the electronic storage of documents comes the possibility of building search engines that can automatically choose documents relevant to a given set of topics. In information retrieval, we wish to match queries with relevant documents. Documents can be represented by the terms that appear within them, but literal matching of terms does not necessarily retrieve all relevant documents. There...
متن کاملClustering and Latent Semantic Indexing Aspects of the Singular Value Decomposition
This paper discusses clustering and latent semantic indexing (LSI) aspects of the singular value decomposition (SVD). The purpose of this paper is twofold. The first is to give an explanation on how and why the singular vectors can be used in clustering. And the second is to show that the two seemingly unrelated SVD aspects actually originate from the same source: related vertices tend to be mo...
متن کاملClustered SVD strategies in latent semantic indexing
The text retrieval method using Latent Semantic Indexing (LSI) technique with truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term-document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collect...
متن کاملClustered SVD strategies in latent semantic indexing q
The text retrieval method using latent semantic indexing (LSI) technique with truncated singular value decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term–document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collect...
متن کامل